Skip to content

Conversation

@jingyu-ml
Copy link
Contributor

@jingyu-ml jingyu-ml commented Jan 23, 2026

What does this PR do?

Type of change: New feature

Overview:

This MR adds HuggingFace checkpoint export support for LTX‑2 by treating TI2VidTwoStagesPipeline as a diffusion-like pipeline, exporting only the stage‑1 transformer (with QKV-fusion-enabled dummy inputs) and falling back to writing model.safetensors when save_pretrained isn’t available. It also preserves the original forward in DynamicModule patching (_forward_pre_dm) so downstream callers can still invoke the pre-patched forward implementation.

Changes

  1. Added the calibration & quantization support of the LTX2, even with FP8 precision.
  2. Preserve original forward before DynamicModule patching: when patching forward, we now stash the pre-patched implementation in self._forward_pre_dm (once) so downstream code can still call the original forward, then re-bind forward to the class implementation. This is needed for the LTX2 FP8 calibration.
  3. Added LTX‑2 HF export path: export_hf_checkpoint() now also treats ltx_pipelines.ti2vid_two_stages.TI2VidTwoStagesPipeline as a “diffusion-like” object and routes it through _export_diffusers_checkpoint() (import guarded; no hard dependency).
  4. Generalized component discovery: introduced get_diffusion_components() (aliasing the old get_diffusers_components) to support non-diffusers pipelines; for LTX‑2 it returns only stage_1_transformer.
  5. Enabled QKV fusion for LTX‑2 backbone: added a model-aware dummy forward generator (generate_diffusion_dummy_forward_fn) that builds minimal LTX Modality inputs (including correct timesteps broadcasting) so shared-input hooks can run and fuse QKV when applicable.
  6. Export fallback for non-save_pretrained modules: when a component lacks save_pretrained (LTX‑2 transformer), export now writes model.safetensors + minimal config.json instead of pytorch_model.bin.

Plans

  • [1/4] Add the basic functionalities to support limited image models with NVFP4 + FP8, with some refactoring on the previous LLM code and the diffusers example. PIC: @jingyu-ml
  • [2/4] Add support to more video gen models. PIC: @jingyu-ml
  • [3/4] Add test cases, refactor on the doc, and all related README. PIC: @jingyu-ml
  • [4/4] Add the final support to ComfyUI. PIC @jingyu-ml

Usage

python quantize.py --model ltx-2 --format fp4 --batch-size 64 --calib-size 1 --n-steps 40 --extra-param checkpoint_path=/home/scratch.omniml_data_2/jingyux/models/LTX-2/ltx-2-19b-dev-fp8.safetensors --extra-param distilled_lora_path=/home/scratch.omniml_data_2/jingyux/models/LTX-2/ltx-2-19b-distilled-lora-384.safetensors --extra-param spatial_upsampler_path=/home/scratch.omniml_data_2/jingyux/models/LTX-2/ltx-2-spatial-upscaler-x2-1.0.safetensors --extra-param gemma_root=/home/scratch.omniml_data_2/jingyux/models/LTX-2/gemma-3-12b-it-qat-q4_0-unquantized --extra-param fp8transformer=true --hf-ckpt-dir ./ltx2-nvfp4

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?:No
  • Did you add or update any necessary documentation?:No
  • Did you update Changelog?:No

Additional Information

Summary by CodeRabbit

Release Notes

  • New Features

    • Added LTX-2 video model support with complete quantization and export pipeline integration
    • Introduced --extra-param CLI option for flexible model configuration and parameter passing
    • Enhanced export capabilities with broader diffusion model compatibility
  • Chores

    • Changed default model data type from Half to BFloat16 for improved numerical stability

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
@jingyu-ml jingyu-ml requested review from a team as code owners January 23, 2026 07:43
@jingyu-ml jingyu-ml changed the title Jingyux/2 3 diffusion export [2/4] Diffusion Quantized ckpt export Jan 23, 2026
@jingyu-ml jingyu-ml marked this pull request as ready for review January 23, 2026 22:36
@jingyu-ml jingyu-ml requested a review from a team as a code owner January 23, 2026 22:36
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
@codecov
Copy link

codecov bot commented Jan 24, 2026

Codecov Report

❌ Patch coverage is 57.57576% with 70 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.40%. Comparing base (aafd388) to head (bc3e5bb).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
...odelopt/torch/quantization/qtensor/mxfp8_tensor.py 25.00% 57 Missing ⚠️
.../torch/quantization/nn/modules/tensor_quantizer.py 22.22% 7 Missing ⚠️
modelopt/onnx/utils.py 86.95% 3 Missing ⚠️
modelopt/onnx/autocast/convert.py 84.61% 2 Missing ⚠️
modelopt/onnx/quantization/quantize.py 95.65% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #810      +/-   ##
==========================================
+ Coverage   73.31%   73.40%   +0.08%     
==========================================
  Files         192      193       +1     
  Lines       19613    19911     +298     
==========================================
+ Hits        14380    14616     +236     
- Misses       5233     5295      +62     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +879 to +892
else:
cpu_state_dict = {
k: v.detach().contiguous().cpu() for k, v in component.state_dict().items()
}
save_file(cpu_state_dict, str(component_export_dir / "model.safetensors"))
with open(component_export_dir / "config.json", "w") as f:
json.dump(
{
"_class_name": type(component).__name__,
"_export_format": "safetensors_state_dict",
},
f,
indent=4,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine these with L851 to L863? They look duplicated.

Why we need to offload tensors to cpu before saving?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we always save with safetensors, keeping the .cpu() is the safe/default choice. this is also how the transformers/diffusers save_pretrained save the tensors to safetensors file.

Copy link
Contributor Author

@jingyu-ml jingyu-ml Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify more?

Can we combine these with L851 to L863? They look duplicated.

Line 880 saves the state dict to safe tensor, line 884 saves the quant config to config.json. we use these 2 function only if the model is not diffusers based.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 cpu_state_dict = {
                        k: v.detach().contiguous().cpu() for k, v in component.state_dict().items()
                    }
                    save_file(cpu_state_dict, str(component_export_dir / "model.safetensors"))
                with open(component_export_dir / "config.json", "w") as f:
                    json.dump(
                        {
                            "_class_name": type(component).__name__,
                            "_export_format": "safetensors_state_dict",
                        },
                        f,
                        indent=4,
                    )

I mean this code block appears twice in the same script.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
@jingyu-ml jingyu-ml requested a review from Edwardf0t1 January 26, 2026 23:00
@jingyu-ml jingyu-ml force-pushed the jingyux/2-3-diffusion-export branch from ef4f814 to 9f0e998 Compare January 27, 2026 09:28
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
@jingyu-ml jingyu-ml requested a review from cjluo-nv January 27, 2026 09:53
Copy link
Collaborator

@ChenhanYu ChenhanYu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented on the dynamic module part.

Copy link
Contributor

@Edwardf0t1 Edwardf0t1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a few more comments.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants